Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time
نویسندگان
چکیده
Visual exploratory analysis of large-scale databases often relies on precomputed query results in order to guarantee interactivity with the visualization system. This is especially true when the query requires joining tables across the database since join is one of the most computationally expensive operations and in the worst case requires joining every row of one table to every row of the other table. Advances in approximate query processing enable quick look summary statistics, but with limitations to the types and complexities of the queries. A recent advancement in approximate query processing is a technique called Wander Join, that performs random walks across these joins, resulting in faster convergence on aggregation values. However, this online aggregation technique is not tailored for visualization tasks that often involve filtering on one or more conditions or viewing aggregation values across multiple groups such as in bar charts. In both cases, the convergence rate is slowed since more samples are needed in order to find records that pass the filters, resulting in the user waiting longer for a confident result. To address this issue, we propose a generalization of the Wander Join algorithm that improves the convergence rate for visualization queries involving filtering and viewing aggregation. We implemented this improved version of Wander Join that we call Load-n-Go and compared it to the original, specifically in the context of visual analysis tasks. Our evaluation finds that our algorithm outperforms Wander Join by reducing the sample complexity. Load-n-Go requires up to 50% fewer samples for group by queries, and up to 85% fewer samples for filtering queries. Such reduced sampling complexity can represent up to 2x and 6x speedups respectively for visual exploratory systems using the Load-n-Go algorithm.
منابع مشابه
Polypharmacy vs. Deprescribing
Usually, the elderly and patients with chronic diseases visit several doctors and are supposed to take multiple medications. The multiplicity of medications is called polypharmacy and it may cause side effects, drug interactions and other helth issues for the patient, and it might go as far as the patient might need to go to the hospital and be admitted in ICU, or even further and be the cause ...
متن کاملApproximate Query Processing: Taming the TeraBytes
2 Garofalakis & Gibbons, VLDB 2001 # Outline • Intro & Approximate Query Answering Overview – Synopses, System architecture, Commercial offerings • One-Dimensional Synopses – Histograms, Samples, Wavelets • Multi-Dimensional Synopses and Joins – Multi-D Histograms, Join synopses, Wavelets • Set-Valued Queries – Using Histograms, Samples, Wavelets • Advanced Techniques & Future Directions – Stre...
متن کاملiJoin: Importance-Aware Join Approximation over Data Streams
We consider approximate join processing over data streams when memory limitations cause incoming tuples to overflow the available space, precluding exact processing. Selective eviction of tuples (loadshedding) is needed, but is challenging since data distributions and arrival rates are unknown a priori. Also, in many real-world applications such as for the stock market and sensor-data, differen...
متن کاملPay-as-you-go Approximate Join Top-k Processing for the Web of Data Technical Report
For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...
متن کاملPay-as-you-go Approximate Join Top-k Processing for the Web of Data
For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017